SPAMS: A Novel Incremental Approach for Sequential Pattern Mining in Data Streams

نویسندگان

  • Lionel Vinceslas
  • Jean-Emile Symphor
  • Alban Mancheron
  • Pascal Poncelet
چکیده

Mining sequential patterns in data streams is a new challenging problem for the datamining community since data arrives sequentially in the form of continuous rapid and infinite streams. In this paper, we propose a new on-line algorithm, SPAMS, to deal with the sequential patterns mining problem in data streams. This algorithm uses an automaton-based structure to maintain the set of frequent sequential patterns, i.e. SPA (Sequential Pattern Automaton). The sequential pattern automaton can be smaller than the set of frequent sequential patterns by two or more orders of magnitude, which allows us to overcome the problem of combinatorial explosion of sequential patterns. Current results can be output constantly on any user’s specified thresholds. In addition, taking into account the characteristics of data streams, we propose a well-suited method said to be approximate since we can provide near optimal results with a high probability. Experimental studies show the relevance of the SPA data structure and the efficiency of the SPAMS algorithm on various datasets. Our contribution opens a promising gateway, by using an automaton as a data structure for mining frequent sequential patterns in data streams. Lionel VINCESLAS Ceregmia, Université des Antilles et de la Guyane, Martinique France e-mail: [email protected] Jean-Emile SYMPHOR Ceregmia, Université des Antilles et de la Guyane, Martinique France e-mail: [email protected] Alban MANCHERON Lirmm, 161 rue Ada 34392 Montpellier CEDEX 5 France e-mail: [email protected] Pascal PONCELET Lirmm UMR 5506 161 rue Ada 34392, Montpellier Cedex 5 France e-mail: [email protected]

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incremental Mining of Across-streams Sequential Patterns in Multiple Data Streams

Sequential pattern mining is the mining of data sequences for frequent sequential patterns with time sequence, which has a wide application. Data streams are streams of data that arrive at high speed. Due to the limitation of memory capacity and the need of real-time mining, the results of mining need to be updated in real time. Multiple data streams are the simultaneous arrival of a plurality ...

متن کامل

Incremental Mining of Closed Sequential Patterns in Multiple Data Streams

Sequential pattern mining searches for the relative sequence of events, allowing users to make predictions on discovered sequential patterns. Due to drastically advanced information technology over recent years, data have rapidly changed, growth in data amount has exploded and real-time demand is increasing, leading to the data stream environment. Data in this environment cannot be fully stored...

متن کامل

A Single-scan Algorithm for Mining Sequential Patterns from Data Streams

Sequential pattern mining (SPAM) is one of the most interesting research issues of data mining. In this paper, a new research problem of mining data streams for sequential patterns is defined. A data stream is an unbound sequence of data elements arriving at a rapid rate. Based on the characteristics of data streams, the problem complexity of mining data streams for sequential patterns is more ...

متن کامل

Pure Incremental Approach for Sequential Pattern Mining

In data mining, mining sequential pattern from a very huge amount of database is very useful in many applications. Most of sequential pattern mining algorithms works on static data means the database should not change. But the databases in today’s real world application do not have static data, rather they are incremental databases. New transactions are added at some intervals of time in databa...

متن کامل

Efficiently Mining High Utility Sequential Patterns in Static and Streaming Data

High utility sequential pattern (HUSP) mining has emerged as a novel topic in data mining. Although some preliminary works have been conducted on this topic, they incur the problem of producing a large search space for high utility sequential patterns. In addition, they mainly focus on mining HUSPs in static databases and do not take streaming data into account, where unbounded data come contin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009